(19) 



Europalsches Patentamt 
European Patent Office 
Office europeen des brevets 



(12) 



(11) EP 0 902 379 A2 

EUROPEAN PATENT APPLICATION 



(43) Date of publication: 

17.03.1999 Bulletin 1999/11 

(21) Application number 98307334.7 

(22) Date of filing: 10.09.1998 



(51) lntCl.6: G06F 17/24, G06F 17/21, 
G06F 17/27 



(84) Designated Contracting Slates: 


• Price, Morgan N. 


AT BE CH CY DE DK ES Fl FR G B GR IE IT U LU 


Palo Atto, California 94306 (US) 


MC NL PT SE 


• Golovchlnsky, Gene 


Designated Extension Slates: 


Palo Alto, California 94306 (US) 


AL LT LV MK RO SI 


• Weiser, Mark D. 




Palo Atto, California 94301 (US) 


(30) Priority: 1S.09.1997 US 929427 




(74) Representative: Skene James, Robert Edmund 


(71) Applicant: XEROX CORPORATION 


GILL JENNINGS & EVERY 


Rochester, New York 14644 (US) 


Broadgate House 




7 Eldon Street 


(72) Inventors: 


London EC2M 7LH (QB) 


• Schim, William N. 


Palo Alto, California 94304 (US) 





(54) A method and system for organizing documents based upon annotations in context 



(57) A document organizing system extracts anno- 
tations made to a document along with the context sur- 
rounding each annotation and organizes the annota- 
tbns based upon the annotation attributes and/or con- 
text. The annotations are created by grouping marks 
based upon their proximity in time and space. The doc- 



ument is segmented to determine a minimum context 
associated with each annotation. A list of the annota- 
tions sorted by the attributes are then displayed to the 
user. The context provided by the Invention for each an- 
notation allows the user to fully understand the annota- 
tion. 
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Description 

[0001] This invention is directed to a document organ- 
izing system. In particutar, this invention is directed to a 
method and a system for organizing documents based 
upon the context of anrxitattons made to those docu- 
ments. 

[0002] When people read paper documents, they of- 
ten make annotations to highBght interesting or contro- 
versial passages and to record their reactions. Common 
annotations include margin notes, vertical bars, stars, 
circles, underlines, highlights, etc. Two advantages of 
annotating directly on the page are its low overhead and 
convenience. One disadvarttage is that the recorded in- 
formatbn is hidden and inaccessible until the reader re- 
turns to the specific page in the specific document. 
[0003] To avoid this problem, some readers use a 
separate reading notebook to record their annotations. 
A reading notetxx>k is useful because it provides a sep- 
arate summary of what the user has read along with any 
commentary. The advantage of a reading notebook is 
that it permits a quick review of the material because it 
generally has less information to browse and search 
than the origir^l document One disadvantage of a read- 
ing notebook, however, is that the reader must recreate 
the context for each note to fully understand the mean- 
ing of each note. 

[0004] Readers also use rK>te cards to organize 
notes. The advantage of a note card system is that the 
cards can be easily reorganized. However, as with a 
reading notebook, unless the reader recreates it, there 
is no context available to permit the user to fully under- 
stand the notes. Additk^nally, each note must be cate- 
gorized onto the correct card before it can be recorded. 
[0005] Handwritten notes and keyworcte are used in 
a system known as 'Marquee" to index video. This sys- 
tem is described in "Marquee: A Tool for Real-Time Vid- 
eo Logging'. K. Webber st aL. Proceedinqs of CHI '94 . 
April 1994. pp. 58-64. incorporated herein by reference 
in its entirety. In "Marquee*, notes are synchronized to 
a vkieo string with time zones that are created with hor- 
izontal line gestures. Keywords are identified by the user 
by circling the words and notes that the user has select- 
ed as keywords. The keywords are assigned to the time 
zone tn which the keyword is created. Keywords also 
may be assigned directly by the user by typing the key- 
word In manually. Because the keywords are associated 
with time the user can view an index of time zones and 
go directly to the video by selecting a time zone using 
an index of the previously kientified keyword or annota- 
tions. Although 'Marquee' uses annotations to index a 
video document, it does not combine the annotations 
with the document In a visual way. 'Marquee' is thus 
anatogous to notetaking in a separate notebook rather 
than on the document itself. 

[0006] 'Dynomite' ts a free-form digital "ink' note- 
book. The digital ink notebook is a pen -based computer 
that the user controls by writing with a pen directly on 



the screen of the computer. The computer senses the 
kxailon and the positions traversed as the pen moves 
across the display and assigns ink marks that corre- 
spond with the positions of the pen. These ink marks 

s are called digital ink because the ink is descn*bed by the 
computer digitally. Dynomite extracts the ink, assigns 
properties to each ink mark and can present a list of the 
ink marks sorted by the assigned properties. This list is 
known as an ink index This system is described In co- 

10 assigned and co-pending EP Patent Applkatkan No. 
983021 27,0 entitled 'System for Capturing and Retriev- 
ing Audio Data and Corresponding Handwritten Notes', 
and 'Dynomite: A Dynamically Organized Ink and Audio 
Notebook", by L Wifcox et aL. In CHI ^97 Conference 

'5 Proceedings. ACM Press. 1997, pp. 186-193. incorpo- 
rated herein by reference in their entireties. This ink in- 
dex shows a "type" of the "ink' along with a time stamp 
arxJ provides links to the original notebook pages. Dyn- 
omite's ink index provides 'ink' marks linked to the cor- 

20 responding full notebook page. However. Dynomite or- 
ganizes only the ink notes themselves and not the as- 
sociated information. 

[0007J "ComMentor" Is a platform for shared annota- 
tions that attaches text-based comments to tocations 

2S within web documents. This system is described in 
"Shared Web Annotattons as a Platform for TTiird-Party 
Value-AcWed Information Providers: Architecture, Pro- 
tocols, and Usage Examples", by M. Roscheisen, etal.. 
Technical Report STAN-CS-TR-97-158a Stanford Inte- 

30 grated Digital Library Project, Computer Science De- 
partment. Stanford University, November 1994. Updat- 
ed April 1 995. incorporated herein by reference in its en- 
tirety. Annotations are grouped into sets. A user can fitter 
these sets and tour through documents within a set. A 

3S tour window shows a list of annotations, each annotatic^ 
shown with the document title of the annotated docu- 
ment arxJ a number of annotatbn attributes. Clicking on 
the annotation causes the display to jump to the source 
document at the posttk>n of the annotatton. ComMentor 

^ uses filtered annotations to produce lists of read docu- 
ments, but does not support paper-like annotations or 
present lists of annotations in context 
[0008] Classroom 2000 is a system for capturing a 
lecture using recorded audio, prepared visual materials 

^ and handwritten notes made on a display overlay of 
viewgraphs. This system is described in "Classroom 
2000: Enhancing Classroom Interaction and Review*. 
byG. Abowdet al.. In Proceedings of CSCW '96. March 
1996, incorporated herein by reference in its entirety. 

^ Searching the text in the viewgraphs retrieves the view- 
graphs along with the overlaid notes. 
[0009] The Freestyle system, whch was deveksped 
at Wang Laboratories, is a mechanism for sketching and 
writing on screen snapshots or on sheets of electronic 

55 paper. Freestyle records cursor nwvement and audio as 
well as the handwriting. This system is described in 
'RapkJ Integrated Design of a Multimedia Communca- 
tion System, and Human-Computer Interface Design", 
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E. Francik. Marianne Rudisill et al. (editor), Morgan 
Kaufman Publishers. Inc., 1996, incorporated herein by 
reference in its entirety. The result is a dynamic multi- 
madia message that can be nr^led to others. Freestyle 
does not provide the ability to organize the handwritten 
annotatbns. 

[0010] The PENPOINT (grating system for pen- 
based computers, recognizes pen gestures for editing 
and allows arbitrary "ink* marks to be placed on top of 
any document using an 'acetate layer*. This system is 
described in "The Power of PENPOINT", by a Carr et 
al., Addison-Wesley. Inc., 1991, incorporated herein by 
refereru:6 in its entirety. Although both Freestyle and 
PENPOINT support free-form document annotation, 
neither provides any way to retrieve documents based 
upon those annotations. 

[001 1] In 1 945 Vannevar Bush descnlDsd a vision of a 
mesh of trails running through a mechanized private file 
and library or memex in "As We May Think", in Atlantic 
Monthly . July 1 945, pp. 101 -108. incorporated herein by 
reference in its entirety. These trails were produced as 
part of the reading activity, and provided a way to create 
and share personal organizations of Infomiation. Bush's 
visions were seminal in the development of hypermedia 
systems such as Engelbart*s NLS and the World Wide 
Web. However, hypermedia systems have focused on 
sharing, browsing and more explicit authoring of links, 
not on personal organization and annotation. 
[001 2] Thus, an annotation system for electronic doc- 
uments is needed that combines the advantages of 
marking directly on a document with quick accessibility 
and the flexible organization of marking on note cards 
or in a notebook. 

[001 3] This invention provides a system and method 
for using digital "ink" for annotations in context to organ- 
ize a reader's activities. The system and method of this 
invention extracts the contents surrounding and under- 
tying a reader's annotations and presents this informa- 
tion to the reader with links to the full context The an- 
notattons in context provided by the system and method 
of this invention permits flexible lowoverhead organiza- 
tk)n of nr^erial without adding to the effort of reading 
and notetaking. 

[0014] These and other features and advantages of 
this inventk>n are described in or are apparent from the 
folk>wing detailed descriptkxi of the preferred embodi- 
ments. 

[0015] The preferred embodiments of this invention 
will be described in detail, with reference to the lolkTwing 
figures, wherein: 

Fig. 1 is a block diagram of the document organizing 
system of this invention; 

Fig. 2 ts a flow chart outlining the control routine of 
one embodiment of this invention; 
Fig. 3 shows a document annotated according to 
this invention; 

Fig. 4 shovra the annotated portions of the docu- 



ment of Fig. 3 

Rg. 5 shows another view of the anr>otated docu- 
ment of Fig. 3; and 

Fig. 6 is a fk>w chart outlining the annotation control 
s routine of one emtxxiiment of this invention. 

[001 q Fig. 1 is a bkx;k diagram of one embodiment 
of the electronic document organizing system 10 of this 
invention. The system 10 has a processor 12 commu- 

10 nk:ating with a display 14, a Hrst storage device 16, a 
second storage device 18 and an input/output interface 
20. The first storage devk;e 16 stores a document 22 
displayable on the display 1 4. The input/output interface 
20 communk^ates with any nunriber of conventior^al in- 

15 put/output devices 24 such as a noouse 26, a keyboard 
28 and/or a pen-based device 30. A user manipulates 
the input/output devices 24 to annotate the document 
22 when displayed on the display 14. The system 10 
then stores these annotations 32 in the second storage 

20 device 18. 

[001 7] As shown in Fig. 1 , the system 1 0 is preferably 
Implemented using a programmed general purpose 
computer. However, the system 10 can also be imple- 
mented using a special purpose computer, a pro- 

55 grammed microprocessor or mrcrocontroller and any 
necessary peripheral integrated circuit elements, an 
ASIC or other integrated circuit, a hardwired electronic 
or k>gic circuit such as a discrete element circuit, a pro- 
grammable togic device such as a PLD, PLA, FPGA or 

30 PAL, or the like. In general, any device on which a finite 
state machine capable of implementing the flowchart 
shown in Rg. 2 can be used to implement the system 1 0. 
[0018] Additkxially, as shown in Fig. 1 , the memories 
16 and 18 are preferably implemented using static or 

35 dynamic RAM. However, the memories 16 and 18 can 
also be implemented using a floppy disk and disk drive, 
a writable optical disk and disk drive, a hard drive, flash 
memory or the like. Additionally, it shoukj be appreciated 
that the memories 16 and 18 can be either distinct por- 

40 tions of a single memory or physically distinct memories. 

[001 9] Furthermore, it should be appreciated that the 
Dnk 17 connecting the memory 16 and the processor 10 
can be a wired or wireless link to a network (not shown). 

45 The network can be a kxal area network, a wkJe area 
network, an intranet, the intemet or any other distributed 
processing ar>d storage network. In this case, the elec- 
tronic document 22 is pulled from a physically renDote 
memory 16 through the link 17 for processing in the 

50 processor 10 according to the method outlined below 
In this case, the electronic document 22 can be stored 
tocally tn a portion of the nrtemory 1 8 or some other mem- 
ory (not shown) of the system 10. 
[0020] The method of this inventkm includes three 

55 distinct processes. Rret. the reader makes annotations 
on a displayed document, and the anrK^tations are ex- 
tracted along with their context. Second, the system as- 
soc^tes a number of attributes with the annotations in 
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order to facilitate retrieval of the annotations and/or the 
undertytng annotated documents. Third, the reader 
views cottections of the annotations In context, where 
the collections are organized by those attributes. 
[0021] The system 10 records annotations on elec- 
tronic documents. A preferred interface for entering the 
annotations is a pen-based computer, where the reader 
•writes' directly on the electronic document On a desk- 
top computer without a pen, clicking a mouse in a margin 
might create a text overlay box to create the annotation. 
The system 10 may also support a number of different 
styles of marking. For example, these styles can include 
swiping with a highlighter pen, underlining text, vertical 
bars in the margin, circled regions, and margin notes. 
[0022] Fig. 2 is a flow chart outlining a control routine 
of one embodiment of the inventkxi. The control routine 
starts at step SI 00 and proceeds to step S110, where 
the user marks on the display of the document with dig- 
ital ink to annotate it. The control routine then proceeds 
to step SI 20. where the system groups the marks of the 
digital ink by time and/or space Into collections of marks, 
treated as a single annotatnn as will be described in 
more detail below. Next, the control routine proceeds to 
step SI 30, where the system determines the minimum 
context for each annotation. The system has a minimum 
context that determines how much of the document that 
surrounds the annotation is to be associated with the 
annotatk>n. The minimum context may be predeter- 
mined as a user preference to be a few words, a sen- 
tence, a paragraph or any other amount in accordance 
with the user's preferences. The minimum context can 
be displayed to the user as a bounding box around the 
minimum context. The bounding box encloses the 
bounding region and the minimum context is defined as 
the content enclosed within the bounding regbn of the 
corresponding annotation. Segmentation procedures 
are applied to the document to divide it into graphical 
components, e.g., lines of text, sentences, paragraphs 
and figures. Given the minimum context, the control rou- 
tine expands the context to include all of the nearby seg- 
ments. With this procedure, the context rr^ay include a 
couple of lines, the surrounding sentence, or the entire 
surrounding paragraph. Fig. 5 shows a bounding box 34 
with the context around a circle annotation 33. 
[0023] The annotatfon control routine is shown in Fig. 
6. The control routine starts at step S200 and proceeds 
to step S210 where the user selects and opens an elec- 
trons document. The user then starts marking on the 
document at step S220 and creates digital ink. The sys- 
tem then determines at step S230 if the new ink is close 
enough in time and space to be associated with previous 
ink marks. The system has time and space thresholds 
that may be predetermined or adjusted in accordance 
with a user's preferences. If the system determines at 
step S230 that the ink nr^rks are not separate the sys- 
tem proceeds to step S240 where the user continues to 
mark As each mark is entered by the user steps 8230 
and S240 are repeated untU the system determines that 



the new ink is separated enough by time and space to 
proceed to step S250. At step S250 the ink marks are 
grouped together as a single annotation and at step 
8260 the context for the annotation is determined and 

5 the attributes are assigned to the annotatk>n. The con- 
trol routine then proceeds to step S270 where the sys- 
tem determines if a new mark has been input. If a new 
mark has been input the control routine returns to step 
8230. If no new mark is entered at step S2B0 then the 

10 annotations are organized and displayed. The control 
routine then stops at step 8290. 
[0024] For some special annotation formats such as 
those shown in Fig. 5, the control routine determines the 
context slightly differently. For margin bars 38 and other 

IS notes in the margin 38, the system ignores the horizontal 
distance when finding nearby segments. Thus, ail ver- 
tically adjacent material is included in the contexts 40 
and 42, respectively. For the line calbuts and circle call- 
outs, the control routine determines the minimum con- 

20 texts and from the underlined or circled text, etc., ignor- 
ing the ink in the caltout gesture. 
[0025] After the context of each annotatk>n has been 
determined, the control routine proceeds to step SI 40, 
where the control routine assigns attributes to the anno- 

25 tations in at least one of th ree ways: 1 ) attributes entered 
by the user; 2) attributes inherited from the document's 
attributes; and 3) implicit or explicit attributes derived 
from the annotatk>ns themselves. 
[0026] The user may enter attributes by interacting 

so with a dialog box or by selecting from a marking menu, 
or by selecting a special pen. Example, attributes de- 
rived from the annotations themselves include "agree*, 
"disagree", 'good idea", and 'foltow-up". In addition, an- 
notation gestures such as "exclamatbn point" and 

35 'questkxi mark" may be interpreted to mean "good idea" 
and "questionable" by the system as they are entered 
on the page. Attributes may also be entered implicitly, 
the most important of which is the date and time that the 
annotation was made and the page number at the an- 

^ notation. Another implicit attribute is the form of the an- 
notation, e.g., highlight, circle, marginal note, etc. 
[0027] Attributes may also be Inferred from docu- 
ments. In the system 10, the electronic documents are 
already associated with a variety of attributes, such as 

<5 creation date, author, providence and title. 

[0028] After the attributes are assigned to each anno- 
tation at step 81 40, the control routine proceeds to step 
8150, where the annc^tions are organized, ordered or 
ranked by the assigned attributes. Subsequently, the 

50 control routine proceeds to step SI 60, where the anno- 
tations are displayed for the user. The control routine 
then proceeds to step 8170. where the control routine 
stops. 

[0029] The system 10 visually presents the annota- 
55 tions In context using different list views. Lists are or- 
dered or filtered by the attrOsutes described above. The 
system 1 0 allows the reader to navigate between these 
views and the underlying electronic documents. Exam- 
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pies of ordered lists include: 

1 ) Ordered by time. This view is analogous to a 
reader's notebook, but also automatically iru:tudes 
the context of each anrustation, as shown In Rg. 4. 
without further effort by the user. 

2) Filtered by attributes. Passages across a number 
of documents are listed in one view; 

3) Filtered by the type of adjacent material. For ex- 
ample, annotations of pictures along with the pic- 
tures themselves; and 

4) Rttered by the content of adjacent material. For 
example, annotated passages mentioning patent 
leather shoes are ranked in relatedness using 
known informaton retrieval techniques. 

[0030] It is to be understood that the term annotation 
as used herein is Intended to include text, digital ink. 
audio, video or any other input associated with a docu- 
ment It is also to be understood that the term document 
is intended to include text. vkJeo, audio and any other 
media and any combination of media. Further, it is to be 
understood that the term text is intended to include text, 
digital ink. audio, video or any other content of a docu- 
ment to include the document' s structure. 



Claims 

1 . A method for displaying context embedded annota- 
tkjns from at least one document, comprising: 

extracting at least one annotation from the at 
least one document; 

extracting a context portk>n for each at least 
one annotation from a corresponding one of the 
at least one document; and 
assigning at least one attribute to each at least 
one extracted annotation. 

2. The method of claim 1 . wherein the step of assign- 
ing at least one attribute comprises assigning at 
least one user-defined attribute to each at least one 
annotatk)n. 

3. The method of claim 1 . wherein the step of assign- 
ing at least one attn'bute comprises assigning at 
least one document-based attribute to each at least 
one anrK>tatk)n. 

4. The method of any one of claims 1 to 3, wherein the 
step of extracting at least one annotation compris- 
es: 

grouping rrtarks by time and space into at least 
one collection, wherein each collection forms 
an annotation; 

segmenting the at least one document into a 
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plurality <^ segments based on the at least one 
annotation; 

determining a minimum context portion for 
each anrK)tatton, from the segments; and 
determining the context porton based on the 
segments sunx>unding the nrunimum context 

5. The method of any one of claims 1 to 4, further com- 
prising: 

ordering the at least one annotation based on 
the at least one attribute; and 
displaying an ordered list of the at least one an- 
notatton along with the corresponding context 

6. An apparatus for displaying annotations from at 
least one document, the apparatus comprising: 

a memory that stores the at least one docu- 
ment; 

a processor that extracts at least one annota- 
tion and a context portion of the at least one 
document corresponding to each at least one 
annotation, that assigns at least one attribute 
to each at least one extracted annotation that 
orders the at least one annotations based on 
the at least one assigned attribute; and 
a display that displays an ordered list of the at 
least one annotations and the at least one cor- 
responding context portion. 

7. The apparatus of claim 6. wherein, for each at least 
one annotation, the processor assigns the at least 
one attribute to that annotatbn based on at least 
one document-based attribute. 

8. The apparatus of claim 6. wherein, for each at least 
one annotation, the processor assigns the at least 
one attribute to that annotatbn based on at least 
one attribute derived from the context portion cor- 
responding to that annotation. 

9. The apparatus of claim 6, wherein when one of the 
at least one annotation is a mark annotation the 
processor assigns the at least one attribute based 
upon that mark annotation. 

1 0. The apparatus of any one of claims 6 to 9. wherein: 

the processor extracts the at least one annota- 
tion by grouping at least one user generated 
mark on the at least one document based on 
the time of the at least one mark and a location 
within the document of the at least one mark 
into at least one collectkxi, wherein each col- 
lection forms a single annotation, the processor 
determines a minimum context portion of the 
corresponding document for each annotation; 
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determines the context for each annotation 
based on the annotation and the minimum con- 
text 
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are drcolated tmder tea ponndi vahie ^^in Loodogg p^>er 
money ^fi^ i^\f ym ffflV* to the ei 



the dBsJera, J Wbea a ten ponnd bank note cornea intone 
hands of aTconsBnier, he is generally obliged to change it 
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at the fiiat shop where he haa occasion to gnrduae five 
shilUogs worth of goodi;l so that it often retnma into the 
handa^f a dealer, oetoe the consnmer baa soent the fortir 



c^nmAl ^ mcngyj wnere bank notea are 



eth part < 

"oaB^Sa as twcniylhillings^ aa in Scntlanri, paper money 
extends itself to a ^^^rr^'lfrp**^? part of the drtnlation be* 
twcen dealers and eoosomera. Before the act of pariiascat^ 
which pot a stop to the ctrcnlatioa of tea and five shOtias. 
notes, it fiPedaftm greater part flf that 
mrTv **f*'* t i^t f^*" *M ^fcj»^^^^ i?7niaper <*«^innn^ if****** fir^ 

- - ■ u i iiUllliiiffi TntThiy rlTpn »««>fao< 



,ttot drcBlatioa. | la «CB>e p«p«f eatnaeuM of Yoriahite, it 




promissory note for dvc ponnds, 
shiUinga, woold be rejected by every body, wiU get it to be 
wtAo p scrapie w him tt is laanfrt fftf ^fT tlT " I ff^ 
aa 1^ sEKoecce, I Bet the freqoent bankniptdea to which such 



beggarly baakera moat be liable, oay occasion a very co9*i 
sidcrable {n cou g en ieocyy and somctimea even a very sreati 
calamity^ to manr poor people who had reifeived th rir notee^ 

"n^^^better, perhaps, that no bank notea were isend ^2. 
ia asy part of the kindom for a smaller s«n than five po cnda . 
Paper money woold then, probably, confine itself ia every 
pert of the kindooi. to the drcnlatiott between the different 
4r%^^^ as amcb aa rt does at present ia Loadoa» where no 
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